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The  essential  goal  of  R.  A.  Fisher’s  fiducial  argument  was  to  make  posterior 
inferences  about  unknown  parameters  without  resorting  to  a  prior  distribution. 
Over  the  past  decade,  there  have  been  two  major  attempts  at  developing  a 
.statistical  theory  that  would  accomplish  this  convincingly.  One  of  these  efforts 
has  been  described  in  a  series  of  publications  by  Fraser,  the  other  in  papers  by 
Dempster.  From  the  early  work  [4],  [11],  [12],  [13],  which  was  tied  to  a 
fiducial  viewpoint,  both  authors  developed  statistical  theories  that  were  distinct 
from  the  fiducial  argument,  yet  achieved  the  goal  of  non-Bayesian  posterior 
inference  [5],  [6],  [7],  [8],  [14],  [15],  [16]. 

Despite  technical  and  other  differences,  the  main  ideas  underlying  this  later 
work  by  Dempster  and  by  Fraser  appear  to  be  similar.  Fraser’s  papers,  analyzing 
statistical  models  that  possess  a  special  kind  of  structure,  arrive  at  “structural 
probability”  distributions  for  the  unknown  parameters.  Dempster’s  papers, 
dealing  with  less  specialized  models,  derive  “upper  and  lower  probabilities”  on 
the  parameter  space.  Disregarding  some  technicalities,  these  upper  and  lower 
probabilities  reduce  to  structural  probabilities  for  the  models  considered  by 
Fraser. 

To  this  extent,  upper  and  lower  probabilities  are  a  generalization  of  structural 
probabilities.  However,  there  appear  to  be  differences  in  interpretation.  Fraser 
has  given  a  frequency  interpretation  to  structural  probabilities  in  [1 1],  [12]  (but 
not  in  later  work);  this  interpretation  depends  upon  the  special  form  of  the 
statistical  models  in  his  theory,  and  does  not  apply  to  Dempster’s  theory. 
Dempster  has  provided  no  simple  interpretation  for  upper  and  lower  probabili¬ 
ties;  he  suggested  in  [7]  that  his  theory  might  be  “an  acceptable  idealization  of 
intuitive  inferential  ‘appreciations’.”  More  recently,  he  has  embedded  his  theory 
within  a  generalized  Bayesian  framework  [9] ,  [10].  The  justification  for  the  latter 
is  unclear  at  present  (see  the  discussion  to  [9]). 

Lacking  in  both  the  Dempster  and  Fraser  theories  are  systematic  methods  for 
dealing  with  estimation  and  hypothesis  testing  problems  (or  suitable  analogues 
of  such).  A  method  of  constructing  tests  was  described  by  Fraser  in  [16],  but  no 
performance  criteria  were  established.  Dempster  [5]  defined  upper  and  lower 
risks  but  did  not  pursue  their  application ;  the  statistical  meaning  of  these  risks 
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is  not  evident  under  his  interpretation  of  upper  and  lower  probabilities.  Since 
even  simple  models  suggest  a  variety  of  natural  estimates  and  tests,  some  theory 
seems  necessary  as  a  guide  to  choice  of  procedure. 

The  results  presented  in  this  paper  proceed  in  several  directions.  A  statistical 
interpretation  for  upper  and  lower  probabilities  and  risks  is  described  in  Section 
2 ;  this  rationale  leads  naturally  to  a  minimax  criterion  for  statistical  procedures 
and.  in  principle,  to  an  alternative  to  standard  decision  theory.  The  desirability  of 
such  an  alternative  stems  from  well-known  awkward  features  of  standard 
decision  theory,  such  as  the  possibility  that  a  test  of  low  size  and  high  power 
may  make  a  decision  which  is  contradicted  by  the  data  (see  Hacking  [17]).  A 
heuristic  account  of  these  ideas  in  a  less  general  context  has  previously  been 
given  by  the  author  in  [l]. 

Section  3  of  the  paper  develops  basic  mathematical  properties  of  upper  and 
lower  probabilities  and  risks  in  the  light  of  Choquet’s  [3]  theory  of  capacities. 
The  results  include  extensions  of  properties  given  by  Dempster  in  [6]. 

In  Section  4,  convenient  conditions  are  established  for  the  existence  of  mini¬ 
max  procedures  (as  defined  in  Section  2).  An  example  in  a  nonparametric  setting 
follows. 

2.  Statistical  background 

An  experiment  is  performed,  resulting  in  observation  x.  It  is  known  that  the 
observed  x  was  generated  from  a  parameter  value  t  and  a  realized  random 
variable  e  by  the  mapping 
(2.1)  *  =  £(e,  t). 

Moreover,  t  lies  in  a  parameter  space  T.  x  lies  in  an  observation  space  X.  and  e  is 
realized  according  to  a  probability  measure  P  on  an  elementary  space  E.  Both 
P  and  the  mapping  ^  are  known.  The  problem  is  to  draw  inferences  concerning 
t  from  x  and  the  model. 

The  following  formal  assumptions  are  made :  X  is  a  Borel  subset  of  a  metric 
space  and  is  endowed  with  the  er-algebra  3C  of  all  Borel  sets.  T  and  E  are  complete 
separable  metric  spaces,  endowed  with  u-algebras  and  S.  respectively.  3T 
consists  of  all  Borel  sets  in  T .  P  is  defined  on  the  Borel  sets  in  E  and  $  is  the  com¬ 
pletion  with  respect  to  P  of  the  cr-algebra  of  these  Borel  sets;  thus  $  contains 
all  analytic  sets.  The  function  £ :  E  x  T  — ►  X  is  Borel  measurable. 

Formally,  performing  the  experiment  described  above  amounts  to  realizing, 
through  physical  operations,  a  specific  triple  (x,  t.  e)  e  X  x  T  x  E.  Before  the 
experiment  is  carried  out  (or  the  outcome  x  is  noted),  the  following  prospective 
assertions  can  be  made  about  the  triple  to  be  realized :  the  chance  that  e  e  B, 
B  e  ( f,  is  P(B) ;  t  is  an  unspecified  element  of  T ;  the  observable  x  is  related  to  t 
and  e  through  (2.1). 

Once  the  experiment  has  been  performed  and  x  has  been  observed,  the  par¬ 
ticular  triple  (x,  t,  e)  that  was  realized  can  be  described  more  precisely.  If 

Tx(e)  =  {teT-.x  =  £(e,  <)}» 


(2.2) 
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it  is  evident  that  the  e  realized  in  the  experiment  must  lie  in 

(2.3)  Ex  =  {eeE-.TJe)  +  0}, 

and  whatever  that  e  is,  the  realized  t  must  belong  to  the  corresponding  Tx(e). 

Since  Ex  =  proj£[£-  1  (a:)],  Ex  is  analytic  under  the  assumptions  and  so  lies 
in  /f.  Let  P\B\E^\  denote  the  conditional  probability  defined  by 

(2.4)  P[I!\KJ  =  1  *J.  Be, t. 

provided  P\E^\  >  0.  If  P\EX ]  =  0.  it  may  still  be  possible  to  condition  upon 
a  suitable  statistic.  In  any  event,  a  modification  of  £  so  as  to  include  round  off 
error  incurred  in  observing  x  will  generally  result  in  P\_E^\  >  0. 

Thus,  after  the  experiment  has  been  performed  and  x  has  been  observed,  the 
following  prospective  statements  can  be  made  about  the  realized  triple  ( x .  t .  e): 
x  is  as  observed;  ee  Xx  and  the  chance  that  e  e  B,  B  e  S.  is  P[B  \  E^\  ;  whatever 
e  is,  t  is  an  unspecified  element  of  the  corresponding  set  Tx(e)\  relation  (2.1)  is 
necessarily  satisfied.  This  collection  of  assertions  about  the  triple  (x,  t.  e)  will 
be  called  the  posterior  model  Jix  for  the  experiment.  Both  Dempster  and  Fraser 
have  previously  considered  reductions  of  this  type,  though  not  in  terms  of 
experimental  triples. 

Since  the  realized  experiment  (x.  t,  e)  is  described  more  precisely  by  the  pos¬ 
terior  model  Jlx  than  by  the  original  model,  it  is  proposed  to  evaluate  statistical 
procedures  of  interest  by  their  average  behavior  over  a  hypothetical  sequence  of 
independent  experiments,  each  of  which  is  generated  under  the  assumptions  of 
Mx.  The  aim  is  to  measure  how  well  a  statistical  procedure  performs  when 
applied  to  hypothetical  experimental  triples  that  are  as  similar  as  can  be  arranged 
to  the  actual  triple  (x,  t,  e). 

Let  D  denote  a  space  of  decisions  and  let  / :  T  x  D  ->  R+  be  a  nonnegative 
loss  function.  Let  denote  the  er-algebra  of  all  Borel  sets  in  R+ .  and  assume 
that  for  every  d  e  D,  /(•.  d)  is  a  measurable  mapping  of  (T,  .T)  into  (R+ .  ^?+). 
Suppose  d  e  D  is  a  specific  decision  whose  consequences  are  to  be  evaluated 
relative  to  the  posterior  model  Jlx  under  the  loss  function  ( . 

Let  {(x,  th  e(),  i  =  1,  2,  •  •  •}  be  a  sequence  of  independent  hypothetical  experi¬ 
ments  generated  under  the  posterior  model;  in  other  words,  e1.e2.  •••  are 
independent  random  variables,  each  distributed  according  to  P[*|J£\C].  t(  is 
selected  arbitrarily  from  Tx(ei),  x  is  the  observed  data.  For  each  i.  the  equation 
x  =  (e£ ,  tt)  will  necessarily  be  satisfied. 

Let  the  general  notation  prop,,^,-)  denote  the  proportion  of  true  propositions 
among  the  propositions  {ft,-,  n2,  •  •  •  .  nn).  The  average  loss  incurred  over  the 
first  n  hypothetical  experiment  as  a  result  of  taking  decision  d  is  n~ 1  E”=  x  f(thd). 
Since  S  2:  0, 

i  "  r® 

^  £  S(ti.d)  =  prop „!/(/;,  d)  >  z]dz. 

i  =  1  1/0 


(2.5) 
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If  A(z,  d)  =  {t  e  T :  /(/,  d)  >  z },  then  A(z,  d)  e  for  every  2  e  R  +  and  d  e  D, 
and  {/(/;,  d)  >  z}  =  e  A(z,  d)}.  Therefore, 

(2.6)  f  prop„[71x(ei)  c  A(z,  d),  Tx(ei)  =f=0]  dz. 

Jo 

^  -  i  nti.d) 

n  =  j 

^  Jo  ProPn[^x(ei)  d)  =£0]  dz. 

Now, 

00  |  n  oo 

(2.7)  f°°  prop„[7,x(eI)  nA(z,  d)  ^  0]  dz  =  -  £  f°°  7c.(z)  dz, 

J  0  Wj*=J4/0 

where  Ct  —  {z  e  R+ :  Tx(ei)  n  A(z,  d)  ^  0}  and  /c.(z)  is  the  indicator  of  Ct. 
Moreover. 


(2.8) 


sup  /(/.  (/), 

teTx(ei) 


and  for  every  d  e  D,  the  function  on  the  right  of  (2.8)  is  a  measurable  mapping 
of  ( E ,  <f)  into  (7?  +  ,  ^?  +  ). 

By  Fubini’s  theorem, 


(2.9)  E  (*  Ic.(z)  dz  =  f  vx\A  (2,  d)\  dz, 

Jo  1  Jo 

where  for  J  e/, 

(2.10)  v„(A)  =  P[e:  7»n.4  +  0|fe’J 
and  the  expectation  is  with  respect  to  P[-|£'x]. 

Since  {e\  Tx{e)  r\  A  ^  =  proj£[<i;_  1(x)  n  E  x  ^4],  this  set  is  analytic, 

belongs  to  S,  and  therefore  vx(A)  is  defined.  The  strong  law  of  large  numbers, 
applied  to  (2.7),  shows  that  as  n  — *  00,  the  upper  bound  in  (2.6)  converges  with 
probability  one  to 

r°°  - 

(2.11)  sx0,d)  —  vx\A(z,  d)\  dz. 

Jo 

A  dual  argument  shows  that  the  lower  bound  in  (2.6)  converges  with  probability 
one,  as  n  — >  00,  to 


(2.12) 


(*co 

rx0,  d)  =  ux[A(z.  d)] 
Jo 


dz. 


where  for  A  e 

(2.13)  ux(A)  =  P[e:  Tx{e)  <=  A,  Tx(e)  +  0|EJ. 
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Thus,  the  lower  risk  rx(tf,  d )  and  the  upper  risk  sx(£,  d )  measure  the  smallest  and 
largest  long  run  average  loss  that  could  be  incurred  as  a  consequence  of  decision 
d.  The  evaluation  is  made  under  the  posterior  model  Jix.  The  relative  desirability 
of  various  decisions  d  e  D  may  be  assessed  by  reference  to  the  corresponding 
risks.  More  generally,  a  decision  procedure  <5 :  X  — ►  D  may  be  compared  with 
other  decision  procedures  by  studying  the  risks  as  functions  on  X. 

For  /  ^  0,  sx(f,  d)  and  rx(S,  d)  are  equivalent  to  the  upper  and  lower  expecta¬ 
tions  defined  by  Dempster  in  [5],  [6]  ;  vx  and  ux  are  the  corresponding  upper  and 
lower  probabilities  defined  on  (T,  2T).  A  frequency  interpretation  for  ux,  vx  is 
obtained  by  specializing  £  in  the  foregoing ;  see  [1]  for  the  result. 

The  frequency  interpretation  for  rx  and  sx  suggests  the  following  simple  opti¬ 
mality  criterion. 

Definition  2.1.  A  decision  d  e  D  is  minimax  under  loss  function  /  and 
observations  x  if  sx(t ,  d)  ^  sx(/,  d')  for  every  d'  e  D. 

This  definition  differs  slightly  from  an  earlier  one  given  in  [1].  An  extension 
of  the  definition  to  decision  procedures  is 

Definition  2.2.  A  decision  procedure  S:  X  —*■  D  is  minimax  under  loss 
function  f  if  sx(f ,  <5(a:))  ^  sx(f.  5'(x))for  every  x  e  X  and  every  S’ : X  — ►  D. 

Finding  a  minimax  decision  procedure  amounts  to  finding  a  minimax  decision 
for  each  x  e  X.  The  existence  of  minimax  decisions  is  discussed  in  Section  4. 

3.  Formal  properties 

Several  basic  theorems  about  ux,  vx ,  rx ,  sx  are  proved  in  this  section.  Some 
of  the  results  have  been  obtained  for  finite  T  by  Dempster  [6].  Further  related 
results,  in  different  contexts,  may  also  be  found  in  Choquet  [3].  Huber  [18], 
and  Strassen  [19].  For  notational  convenience,  the  subscript  x  is  dropped 
throughout  the  rest  of  this  paper. 

Let  (f)  be  a  real  valued  set  function  defined  on  .  For  B.  A il,  A2.  ' '  '  ,  Ap  in 
let 

(3.1)  Ap  =  4(B)  -  Ai^  Aj) 

-  ■  ■  •  +  {-l)p(f){B  A1  u  •  •  •  uAp), 

and  let 

(3.2)  Vp  =  4(B)  -  ^(BnAt)  +  n  A{  n  Aj) 

-•••  +  (  -  1  )P4(B  nAx  n  •  •  •  nip). 

The  sums  in  (3.1)  and  (3.2)  are  taken  over  all  possible  distinct  combinations  of 
indices,  excluding  combinations  that  repeat  indices.  Following  Choquet  [3], 
we  say  that  4  is  alternating  of  order/?  if  Ap  0  for  arbitrary  B,  A  x ,  •  •  •  ,  Ap& 
and  is  monotone  of  order  p  if  Vp  ^0  for  arbitrary  B,  A  1}  •  •  •  ,  Ap  e  3~. 

Proposition  3.1.  The  set  function  v  is  alternating  of  all  orders.  The  set 
function  u  is  monotone  of  all  orders. 
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Proof  (essentially  due  to  Choquet).  The  probability  P['\EX]  is  monotone 
and  alternating  of  all  orders.  Now 

(3.3)  v(A)  =  P[ij/(A)\EX],  Ae*T, 

where  ^t{A)  =  proj£[<^_  r\E  x  A].  If  A  A2  e 

(3.4)  i//(At  uA2)  =  \J/(At)  u  i//(A2), 

therefore  v  is  alternating  of  all  orders.  The  complete  monotonicity  of  u  then 
follows  by  property  (c)  of  Proposition  3.2. 

Proposition  3.2.  The  set  functions  u  and  v  defined  on  AT  have  the  following 
properties : 

(a)  u(0)  =  v(0)  =  0; 

(b)  u(T)  =  v(T)  =  1; 

(c)  u(A)  +  vC^A)  =  1 ; 

(d)  u{A)  SHA); 

(e)  if  A  c =  B ,  u{A)  ^  u(B )  and  v(A)  ^  v(B): 

(f)  u(A  kj  B)  +  u(A  n5)  ^  u(A)  +  u(B). 
v(A  u  B)  4-  v{A  n  B)  ^  v(A)  +  v(B) ; 

(g)  ifAnlA,  u{An)[u{A ),  while  ifAJA,  v(An)]v{A). 

Proof.  Properties  (a),  (b),  (c),  (d)  are  immediate  from  the  definitions  of  u 
and  v.  Property  (e)  is  equivalent  to  Vj  ^  0  for  u  and  A1  ^  0  for  v,  while 
property  (f)  is  implied  by  V2  ^  0  for  u  and  A2  ^  0  for  v.  These  inequalities 
were  established  in  Proposition  3.1.  Finally,  from  (3.3),  v(An)  =  P[il/(An)\Ex~\. 
IfAJA. 

(3.5)  iAMjrO  >ha.)  =  wCm 

This  implies  the  second  half  of  (g).  The  first  half  now  follows  from  (c). 

Remark  3.1.  The  counterpart  of  (g)  with  u  and  v  interchanged  does  not  hold 
in  general. 

Remark  3.2  Properties  (c),  (d)  and  the  first  property  in  each  of  (a),  (b),  (e), 
(f ),  (g)  imply  the  remaining  properties.  All  further  propositions  proved  in  this 
section  are  consequences  of  Proposition  3.2  alone. 

Proposition  3.3.  The  following  inequalities  hold  on  PT.  If  A  n  B  =  </>,  then 

(a)  u{A)  +  u(B)  ^  u{A  u B)  ^  u{A)  +  v(B), 

(b)  u{A)  +  v{B)  ^  v{A  u5)  ^  v(A)  +  v(B). 

Proof.  The  lower  bound  in  (a)  and  the  upper  bound  in  (b)  follow  from  (f ) 
of  Proposition  3.2.  Since  A  r\B  =  0.  B  c=  ^ A .  Therefore. 

(3.6)  v(B)  4-  v(f€A  n^B)  =  vf€A  n B)  +  v{^A  n^B)  ^  v(VA), 
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which  is  equivalent  to  , 

(3.7)  v{B)  +  1  -  u(A  uB)  ^  1  -  u(A). 

The  upper  bound  in  (a)  holds,  consequently.  A  dual  argument  establishes  the 
lower  bound  in  (b). 

Remark  3.3.  The  upper  bounds  in  (a)  and  (b)  are  valid  without  the  con¬ 
dition  A  nB  =  0. 

Proposition  3.4.  The  following  inequalities  hold  on  ,T : 

(a)  u(A  u5)  +  u(A  n  B)  ^  u(A )  +  v(B), 

(b)  v{A  uJ5)  4-  v(A  nB)  ^  u(A)  +  v(B). 

Proof.  Since  A  u  B  =  fiu  ( A-B )  and  since  A  =  (A—B) u  ( A  nB),  Proposi¬ 
tion  3.3  shows  that 

(3.8)  v(A  u^)  ^  u(A-B)  +  v(B) 
and 

(3.9)  u(A)  ^  u(A-B)  +  v(A  nB). 

These  two  inequalities  imply  (b).  Inequality  (a)  is  proved  by  taking  complements 
in  (b). 

Proposition  3.5.  If  A,  B,  C  e  2T  and  B  a  A,  then 

(a)  v(A  u  C)  —  v(A)  ^  u(fiu  C)  —  v(B), 

(b)  u{A  nC)  —  u{A)  ^  u(B  n  C)  —  u(B). 

Proof  (Choquet).  If  X  =  B  vj  C  and  Y  =  A,  then  XuF  =  A  u  C  and 
X  nY  =  B  u  {A  n  C)  3  B .  Therefore, 

(3.10)  v{A  uC)  +  v(B)  ^  v(X  u7)  +  v(X  n  Y)  v(B  u  C)  +  r(A) 

by  Proposition  3.2.  This  establishes  (a).  Inequality  (b)  is  derived  by  taking 
complements  in  (a). 

Proposition  3.6.  If  the  {A,}  and  belong  to  and  B{  a  A then 

(a)  »(Uf  ^,)  -  t>(U  fB,)  g 

(b)  ^(0*  A{)  —  M(flf  Bf)  ^  Zf[w(A,)  —  w(Z?,)]. 

Proof.  The  result  is  established  by  Choquet  [3]  for  finite  unions  and  inter¬ 
sections  (through  induction  and  Proposition  3.5).  Taking  limits  and  using  (g) 
of  Proposition  3.2  completes  the  proof. 

Proposition  3.7.  For  every  sequence  {An}  in  ZT, 

(a)  v(limn  inf  An)  5=  lim„  inf  v(An)< 

(b)  w(lim„  sup  A„)  ^  lim„  sup  u{An). 

Proof.  Since  Am  =>  inf„^m  An  and  since  inf„^m  A„  f  lim„  inf  An  as  m  -*■  oo, 
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lim  \niv{Am)  ^  lim  v{  inf  An)  =  v(lim  inf  A„), 

m  m~*  cc  n^m  n 

using  (e)  and  (g)  of  Proposition  3.2.  A  dual  argument  proves  (b). 

Remark  3.4.  The  roles  of  u  and  v  cannot,  in  general,  be  interchanged  in 
Proposition  3.7. 

Proposition  3.8.  Let  9  =  {A  e  9 :  u{A )  =  v{A )} .  Then  9  is  a  a -algebra  and 
u  —  v  is  a  probability  measure  on  ( T ,  9). 

Proof.  Clearly  (f),  T  e  9.  If  A  e  9,  then  <€A  g  9\  indeed  u{A )  =  v{A)  implies 
v(^A)  =  uif€A)  by  (c)  of  Proposition  3.2.  If  the  {A,}  g  9  and  are  disjoint,  then 
UfA,-  e  9\  indeed  by  Proposition  3.3,  for  any  integer  n  >  1, 


(3.12) 


U  Ai  =  S  M^«-)  =  Z  v(Ai)  ^  v  U  Ai  • 


i  =  1 


i  =  1 


Therefore,  m(U"  At)  =  v(U"  At).  Moreover,  by  Propositions  3.7  and  3.2, 

(oo  \  tt  /  n  \ 

(J  At  J  =  u(lim  (J  At)  ^  lim  sup  ul  (J  At 
i  /  »i  ■  \  i 

=  lim  sup  v( (J  A^\  =  v({J 


"  \  i  /  \  i  / 

so  that  m(UJ°  At)  =  v(U®  A{).  The  fact  that  u  =  v  is  a  probability  on  (T,  9) 
follows  from  Propositions  3.2  and  3.3. 

Remark  3.5.  This  theorem  links  upper  and  lower  probabilities  to  structural 
probabilities.  For  Fraser’s  models,  9  contains  all  Borel  sets.  In  general,  however, 
9  may  be  trivial. 

Let  ^  denote  the  vector  lattice  of  measurable  functions  mapping  (T,  9)  into 
( R ,  01),  the  real  line  endowed  with  the  cr-algebra  of  all  Borel  sets.  Let  = 
{/g#:  /  ^  0}.  The  following  definitions  are  abstracted  from  the  upper  and 
lower  risks  of  Section  2. 

Definition  3.1.  ///g  ^  +  ,  the  upper  integral  s(f)  and  the  lower  integral  r{f) 
are  defined  as 


(3.14) 


«{/)  =  f°°  v[f(t)  >  z ]  dz, 
Jo 

r{f)  =  f*  u[f{t)  >  z]  dz. 
Jo 


To  extend  the  definitions  to  fe  let  /+  =  /  v  0  and  /  =  — /  v  0,  so  that 

/  =  r-r- 

Definition  3.2.  If  fe  the  upper  integral  s(f)  and  the  lower  integral  r(f) 

are  defined  as 

s(f)  =  s(f+)  -  r(f~), 


(3.15) 


r(f)  =  r(f+)  -  s(f~), 


excluding  the  indeterminate  case  oo  —  oo . 
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Definition  3.2  can  also  be  motivated  by  a  frequency  interpreration. 
Proposition  3.9.  The  following  assertions  hold  for  functions  in  # : 


(b)  s(a  +  bf)  = 

(c)  r(a  +  bf)  = 


(a)  if  r(f)  and  s(f)  both  exist,  then  r(f)  ^  s(f); 

\a  +  bs(f)  if  b  ^  0  and  s(f)  exists 
a  +  br(f)  if  b  ^  0  and  r(f)  exists; 

\a  +  br(f)  if  b  ^  0  and  r(f)  exists 

[a  +  bs(f)  if  b  ^  0  and  s(f)  exists  ; 

(d)  if  s(f),  s(g)  both  exist  and  f  ^  g,  then  s(f)  ^  s(g); 

(e)  */r(/)>  r(9 )  both  exist  and  f  ^  g,  then  r(f)  ^  r(g); 

(f)  if  the  {«(/„)}  all  exist  and  r(ff)  <  oo  for  at  least  one  n,  then  /„  j /  implies 
that  s{f)  exists  and  s(fn)  |  s{f)  ; 

(g)  if  the  {/•(/„)}  all  exist  and  r(ff)  <  oo  for  at  least  one  n,  then  f„  j  /  implies 
that  r(f)  exists  and  r(f„)  j,  r(f). 


Proof.  Assertion  (a)  holds  for  /  e  by  (d)  of  Proposition  3.2,  and  hence 
as  stated.  If  f  e^+ ,  a  ^  0,  and  b  ^  0,  a  change  of  variable  in  Definition  3.1 
shows  that 


s{a  +  bf)  =  a  +  bs(f), 
r(a  +  bf)  =  a  +  br(f). 


Therefore,  if  a  ^  0,  b  ^  0,/e  and  s(f)  exists, 

(3.17)  s{a  +  bf)  =  s{a  +  bf+)  -  r(bf~) 

=  a  +  bs{f+)  -  br(f~)  =  a  +  bs{f). 


The  other  cases  in  assertions  (b)  and  (c)  are  proved  similarly. 

For/,  g  e  ^+,  assertions  (d)  and  (e)  are  immediate  from  (e)  of  Proposition  3.2. 
If/,  g  e  (€,  f  ^  g,  then/+  ^  g+ ,  /“  ^  g~ ,  and  assertions  (d)  and  (e)  follow  as 
stated. 

To  prove  (f)  and  (g),  note  that  if  the  {/„}  e  %>+  and  fn]fe^  +  ,  then  for  any 
zeR+, 

(3.18)  {/„«)  >  z}  t  0  {/„(*)  >  z}  =  {f(t)  >  z}, 

1 

consequently  s(fn)‘\s(f)  by  (g)  of  Proposition  3.2.  Similarly,  if  {/„}  e^+  and 
fnlf€(#  +  ,  then  r{fn)lr(f).  Now  suppose  the  {/„}  e  #  and/„T/.  By  the  fore¬ 
going,  «(/„+)T«(/+)  and  r(f~)[r(f~).  Since  r(f~)  <  oo  for  at  least  one  n, 
r(f)  <  oo ;  therefore  s(f)  exists  and  (f )  follows.  Assertion  (g)  is  proved 
analogously. 

Remark  3.6.  In  general,  the  roles  of  r  and  s  cannot  be  interchanged  in  (f ) 
and  (g)  above. 
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Proposition  3.10.  Let  /,  g 

(a)  If  either  s(f+)  +  s(g+)  <  oo  or  s(f~)  +  r(g~)  <  oo,  then  r(f)  +  s(gr) 
^  «(/  v  g)  +  s(f  a  g)  g  s(f)  +  s(g). 

(b)  If  either  s(f~ )  +  s(g~)  <  oo  or  r(f+)  +  s(gr+)  <  oo,  then  r(f)  +  r(g) 
^  r(f  v  g)  +  r(f  a  g)  ^  r(/)  +  s(gr). 


Proof.  Let/,  g  ec6+ .  Since  for  any  z  e  R+ , 

{/( 0  v  gr(<)  >  2}  =  {/(*)  >  2}  u  {sr(0  >  2}, 
{/(<)  A  flf(0  >  2}  =  {/(<)  >  2}  n  {gf(0  >  2}, 


inequalities  (a)  and  (b)  follow  from  Propositions  3.2  and  3.4.  If  /,  g  e  then 
(/  V  9)+  =  r  v  g+,  (/  a  S)+  =  /+  a  g\ 

(3'20)  (/  v  g)~  =  f~  v  g",  |/»jr=f  aj', 

and  the  proposition  follows  from  the  results  on  <^’+. 

Proposition  3.11.  Let  {/„}  be  a  sequence  of  functions  in  (€. 

(a)  If  g  e%>,r{g~)  <  co,andf„  ^  g  for  all  n,  then  s(lim„  inf/„)  ^  lim„  inf  s(/„). 

(b)  Ifge^,r{g  +  )  <  00  ,andf„  ^  g  for  all  n,  then  r(\imn  sup  fn)  ^  limn  sup  r(/„). 


Proof.  In  (a),  since  inf m§„/m  ^  g,  r([infm^„/m]  )  ^  r(g  )  <  00  for  all  w, 
and  therefore  s(infMg„/m)  exists  for  all  n.  Similarly,  r(f~)  <  00  for  all  m,  so 
that  s(fm)  exists  for  all  m.  By  Proposition  3.9,  s(limn  inf/„)  exists  and  as  n  -*  00, 


(3.21) 


inf  s(fm)  ^  s(  inf  fm)] «(lim  inf/„), 

min  min  n 


which  proves  (a).  A  dual  argument  establishes  (b). 

Let  {A„:  Ane  ZT,  n  =  0,  ±1,  ±2,  •  •  •}  be  a  countable  partition  of  T,  and  let 
si  denote  the  cr-algebra  generated  by  this  partition.  \i  B  e  srf,  B  =  U/A,-,  where 
I  is  countable.  Define  a  set  function  q  on  as  follows: 


(3.22) 


q(Aj)  =  v  U  Ai  -  v\  U  Ai 


i=j+  1 


More  generally,  if  B  e  stf,  B  =  UjAh  define  q{B)  by 
(3.23)  q(B)  =  YJq(Ai). 


3  =  0,  +!,•••  . 


Lemma  3.1.  Z/lim v(U,®  „  Ax)  =  0,  then  for  every  B  e  s/,  u(B)  ^  q(B)  ^ 
v(B),  and  q  is  a  probability  measure  on  j/. 


Proof.  To  verify  that  q  is  a  probability  on  <$/,  note  first  that  q  is  countably 
additive  by  definition.  Since  v  is  monotone,  q{Af)  ^  0  for  all  j  and  hence 
q(B)  ^  0  for  B  e  s/.  Also 
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(3.24) 


q{T)  =  X  q{A{)  =  lim  £  g(4,-) 


m,  n-»  co  i=  -m 


=  lim 

m,  m-*  co 


»  U  w  U 


=  1, 


by  Proposition  3.2  and  the  hypothesis  of  the  lemma. 
From  Proposition  3.3,  applied  to  (3.22), 


(3.25)  q(Aj)^v(Ai),  j  =  0,  +1,  ±2,  •••. 

Let  C  =  Uj^4j,  with  J  a  finite  set  of  natural  numbers,  and  suppose  that 
q(C )  ^  v(C).  If  A:  is  a  natural  number  smaller  than  any  element  of  J . 


(3.26)  q[AtuC)  =  q(Ak)  +  q(C) 

=  ^0  Ai)  -  0  A^  +  v(C) 

^  v(Aku  C), 

the  last  inequality  coming  from  (f )  of  Proposition  3.2.  Starting  with  (3.25)  and 
applying  (3.26)  a  finite  number  of  times  shows  that  for  any  finite  J, 

(3.27)  «(U  Aj)  5  t'(U  Aj)- 

J  J 

Taking  limits  establishes  the  inequality  q(B)  ^  v(B)  for  any  B  e  srf.  Finally,  if 
B  e  then  B  e  srf  and  q(f$B)  5*  v{f€B),  hence  by  Proposition  3.2,  u(B)  ^  q(B). 
A  function  /  e  ^  is  elementary  if  it  can  be  represented  in  the  form 


(3.28)  /(f)  =  Z  aIAjW’ 

j=~  CD 

where  {An  \  A„e  ^  n  =  0,  ±1,  +  2,  •  •  •}  is  a  partition  of  T  when  repetitions  are 
excluded,  a0  =  0,  and  aj+1  —  dj  ^  <5  >  0  for  each  j  and  some  S.  If  all  but  a 
finite  number  of  the  {An}  equal  0,  then  /  is  a  simple  function. 

Lemma  3.2.  If  /e^7  is  elementary ,  with  representation  (3.28).  and  if 
|s(/)|  <  oo,  then 

(a)  lim„_>00  an  w(U,®  „  ^4,)  =  0,  lim,,^.^  an  t*(Uf=®  At)  =  0, 

(b)  s (/)  =  SJL-oo  ajq(Aj). 


Proof.  Under  the  hypotheses  of  the  lemma, 


(3.29)  /+(f)  =  Z  <*/.,(«),  /'<<)  =  Z  -  «j 'aj(I), 

j= 1  j =  ~  1 

and  s{f+)  <  oo,  r(f~ )  <  oo.  From  Definition  3.1. 


(3.30)  s(f+)  =  X  (aj  -  aj_x)v (  (J  A, 

j=  i 


=  lim 

n->  oo 


L  Oj9(^j)  +  a^'(  U 

.j  = 1 
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Therefore,  since  s(f+ )  <  oo, 


(3.31) 


lim  (a„  -  an_x) 

n->  oo 


and  since  anv(Ufi=n  + 1  AJ  ^  0, 


(3.32)  lim  £  afl(Aj)  ^  s(f+)  <  oo. 

n-»oo  l 

Moreover,  because  of  (3.31),  lim,,.,^  v(U,®  nAi)  =  0  and 

(00  \  00  00 

IM  =  an  Z  ^Aj)  =  Z  ajQ(Aj)- 

i=n  J  j=n  j=n 

From  these  relations  follow  the  first  part  of  (a)  and 


(3.34)  s(f+)  =  X  ajq(Aj). 

I=i 

A  similar  argument  on  r(f~)  establishes  the  other  part  of  (a)  and 


(3.35)  /■(/-) 


>fU  -  »(  "J  A, 


Z  aMAj), 


j=~  1  L  \i  =  j  /  \i=j~  1  /J  j  =  -  1 

the  second  equality  coming  from  Proposition  3.2.  Finally,  (b)  is  a  consequence 
of  (3.34)  and  (3.35). 


Remark  3.7.  From  Lemma  3.1,  the  set  function  q  appearing  in  Lemma  3.2 
is  a  probability.  Thus,  (b)  represents  s(f)  as  an  expectation. 

Proposition  3.12.  Let  f,  g  e  %  be  such  that  f  +  g  is  defined. 

(a)  If  either  s(f+)  +  s{g+)  <  oo  or  s(f~)  4-  r{g~)  <  x.  then  r(f)  4-  6-(p) 

^  fif  +  9)  ^  fif)  +  fig)- 

(b)  If  either  *(/")  4-  s(g~)  <  x  or  r(f+)  4-  s(g+)  <  X.  then  r(f)  +  r{g) 
^  fif  +  g)  ^  fif)  +  s(g). 


Proof,  (i)  Let  f,  g  e*#  be  elementary  functions  to  which  the  hypotheses  of 
(a)  apply.  Then  s(f).  s(g )  and  r(f)  exist.  Assume  that  \s(f  4-  p)|  <  X.  The  sum 
f  +  g  may  be  represented  in  the  form  (3.28).  If  e(-)  denotes  expectation  with 
respect  to  q.  then  by  Remark  3.7  and  the  preceding  lemmas, 

(3.36)  s(f  +  g)  =  e(f  +  g)  =  e(f)  +  e(g)  S  fif)  +  fig)- 

(ii)  if/-  g  €  each  may  be  approximated  from  below  by  a  monotone  in¬ 
creasing  sequence  of  elementary  functions.  Under  the  hypotheses  of  (a)  and  if 
|6>(/  +  g)\  <  x,  the  result  of  (i)  applies  to  approximating  elementary  functions. 
Taking  monotone  limits  establishes 

fif  +  g)  ^  fif)  +  fig)- 


(3.37) 
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Special  cases.  If  s{f  +  g)  =  —  oo  and  the  hypotheses  of  (a)  hold,  then  (3.37)  is 
trivial.  If  s(f  +  g)  =  oo,  then  s(f+  +  g+)  ^  s\_(f  +  gf)  +  ]  =  oo.  Since  /+, 
g+  e#+,  each  may  be  approximated  from  below  by  a  monotone  increasing 
sequence  of  simple  functions,  each  of  which  is  in  #  +  and  is  bounded.  The  result 
in  (i)  for  elementary  functions  applies;  taking  monotone  limits  shows  that 
s(f+ )  +  s(g+)  ^  s(f+  +  g  +  )  =  oo.  Thus,  one  of  s(f),  s{g)  is  oo  and  (3.37)  is 
valid. 

In  summary,  therefore,  if  s(f  +  g)  exists  and  the  hypotheses  of  (a)  hold,  then 

(3.37)  is  valid.  Under  the  same  assumptions, 

(3.38)  s(g)  =  s(f  +  g-f )  £  s(f  +  g)  +  s(-f), 

which,  by  Proposition  3.9,  is  equivalent  to  the  left  inequality  in  (a). 

(iii)  Suppose  /,  g  e  #  and  the  hypotheses  of  (b)  hold,  ensuring  that  r(f), 
r(g),  s(g)  exist.  Assume  also  that  r(f  +  g)  exists.  Since  r(f  +  g)  =  —s(  — /  —  g), 
the  inequalities  of  (b)  follow  from  (ii). 

(iv)  To  complete  the  proof,  it  is  necessary  to  show  that  s(f  +  g),  r(f  +  g) 
exist  under  the  hypotheses  of  (a)  and  (b),  respectively.  Since  s(f+  +  g  +  ), 
r(f~  +  g~)  exist,  it  follows  from  (ii)  and  (iii)  that 

*[(/  +  ff)+]  ^  «(/+  +  g+)  ^  «(/+)  +  *(9+). 

''U")  ’{(/  +  ff)~]  ^  *•(/■  +  9~ )  S  «(/“)  +  r{g~). 

Thus  s(f  +  g)  exists  under  the  hypotheses  of  (a).  A  dual  argument  shows  that 
r(f  +  g)  exists  in  (b). 

Corollary  3.1.  Let  f  e  #. 

(a)  //«(/)  exists,  |s(/)|  ^  s(\f\). 

(b)  //>(/)  exists,  \r(f)\  ^  s{\f\). 

Define  sets  K,  K+  c =  Rn  as  follows : 


K  =  {x  e  Rn  :  Xi  ^  0,  x2  ^  0,  *  •  • ,  x„  ^  0}, 
K+  =  {x  €  Rn  \  xx  >  0,  x2  >  0,  •  •  •  ,  xn  >  0}. 


Proposition  3.13.  Let  h:  K  ->  R  be  continuous  and  concave  in  K  and  such 
that  h{x)  >  0  and  h(Ax)  =  Xh(x)  for  every  x  e  K+  and  A  ^  0.  Let  fx ,  f2,  ■  •  •  , 
/„  e  #  +  he  such  that  s(f)  <  oo  for  1  ^  i  ^  n.  Then 

(3.41)  s[h{f,{t),  •••, /„(«))]  ^  h[s{f,  ),•••,  «(/„)]. 

Proof.  By  Propositions  3.9  and  3.12,  s(- )  is  an  increasing  gauge  on^+.  The 
theorem  follows  from  a  general  result  due  to  Bourbaki  (see  Berge  [2],  p.  212). 
If/e#,  define  ||/||p  by 

I/ll,  =  N/l')]1".  1SP<00, 

(  ’  ll/IL  =  sup{*:#(|/(0|  >  *>  >  0}. 
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Proposition  3.14.  Let  f,  g  e  #.  Then 

(a)  0  ^  \\f\\p^  \\f\\qif  1  oo, 

(b)  v(|/(J)|  =£0)  =  Oifand  only  if  \\f\\p  =  for  l  ^  p  ^  cc, 

(c)  || a/ ||p  =  a||/||p  i/a  ei2+  and  1  ^  p  ^  go, 

(<*)  ll/^llr  ^  WfWMU  */  1  Sp,q,r  ^  co  and  r"1  =  p'1  +  q~\ 

(e)  1/  +  9 ||p  ^  ||/||p  +  ||9 1  p*/1  ^p  ^  co. 

Proof.  Assertions  (b)  and  (c)  are  immediate.  Apart  from  special  cases, 
(d)  and  (e)  follow  from  Proposition  3.13,  or  may  be  proved  from  the  Holder 
and  Minkowski  inequalities  along  the  lines  of  Proposition  3.12.  Assertion  (a) 
is  a  consequence  of  (d). 

4.  Minimax  decisions 

Conditions  for  the  existence  of  minimax  decisions,  as  defined  in  Section  2,  are 
provided  by  the  theorem  below.  Examples  of  minimax  procedures  for  a  distri¬ 
bution  free  estimation  problem  follow. 

Proposition  4.1.  Let  T,  D  be  compact  metric  spaces  and  let  t  \  T  x  D  -*  R 
be  continuous.  Then 

(a)  s(£,  d)  and  r{f,  d)  are  uniformly  continuous  on  D, 

(b)  the  suprema  and  infima  over  D  of  r(S,  d)  are  attained. 

Proof.  Let  ra(*  ,  •)  denote  the  metric  on  D.  Since  T  x  D  is  compact  metric, 
£(t,  d)  is  uniformly  continuous  on  T  x  D.  Therefore,  to  every  e  >  0  there 
corresponds  an  rj  >  0  such  that 

(4.1)  m{d,  d')  <  rj  =>  \f(t,  d)  —  f(t,  d’)\  <  e 

for  every  t  e  T.  Applying  Proposition  3.9,  parts  (b),  (c),  (d),  (e),  to  the  right  side 
of  (4.1)  establishes 

(4.2)  | s(£,d)  —  s(t,  d')\  <  e,  |r(/,  d)  —  r(S,  d')\  <  e, 
hence  (a)  and  (b). 

Example.  An  example  of  the  statistical  model  described  in  Section  2  is  the 
nonparametric  version  of  the  two  sample  location  shift  model.  If  (pcx ,  •  •  •  ,  xm) 
are  the  observations  of  the  first  sample  and  {yx,  •  •  •  ,  yn)  are  the  observations  of 
the  second  sample,  the  model  can  be  written  in  the  form 

xt  =  F~l{Ui),  1  ^  i  ^  m, 

(4'3)  y,  =  y  +  i  ij&», 

where  (w1?  •  •  •  ,  um+n)  are  realizations  of  independent,  identically  distributed 
random  variables,  each  uniformly  distributed  on  [0,  1],  F  e  2F,  the  class  of  all 
continuous  distribution  functions  on  the  real  line,  p  €  Cl  —  (  — oo,  oo),  and 
(p,  F)  is  the  unknown  parameter.  Equations  (4.3)  are  of  the  general  form  (2.1). 


MINIMAX  PROCEDURES 
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Let  {dt j  =  yj  —  x(,  1  ^  i  ^  m,  1  ^  j  ^  n}  and  let  04  <  a2  <  •  •  •  <  aM_  x , 
where  il/  =  mn  +  1,  denote  the  ordered  {d,j}.  Under  the  original  model,  the 
strict  ordering  will  be  possible  with  probability  one.  Let  Qj  =  (  —  00,04),  let 
Qj  =  (ai_1,  a,)  for  2  ^  ^  M  —  1,  and  let  QM  =  (aM_l5  00).  For  arbitrary 

A  cz  Cl,  define 


(4.4) 
and 

(4.5) 


3v(i,A) 


if  A  n  Q,  0 
otherwise, 


5u(i,A) 


if  Cli  c=  A 
otherwise. 


Then,  as  shown  in  [1],  for  arbitrary  A  c  Q, 


(4.6) 


1  M 

V(A  x  #')  =  £  3v(i,A), 

u{A  x  ^)  =  JL  £  Su(i,A). 


This  collection  of  upper  and  lower  probabilities  determines  the  upper  and 
lower  risks  if  the  loss  function  does  not  depend  upon  F.  For  example,  suppose 
that  it  is  desired  to  estimate  y,  that  h:  R  +  -*■  R  +  is  strictly  monotone  increasing, 
and  that  the  loss  function  of  interest  is 


(4.7) 


S{y,  d) 


Mi*  -  ^|) 

if  |  y  —  d  |  ^  c 

U(c) 

if  |/i  —  d|  >  c. 

wherec  >  «m-i  —  ai-Leti^  =  [aM~x  —  c,\{ax  +  a2)],let-6f  =  [!(«,•_!  +  a,), 
i(af  +  ai+1j]  for  2  ^  i  ^  M  -  2,  and  let BM.X  =  [i(aM_ 2  +  aM.l),ax  +  c]. 
Then  for  /  defined  by  (4.7), 


(4.8)  8(S,  d)  =  -£  [  £  H\ a}  -  d|)  +  2 fc(c)] 

m  j*i 

if  d  e  Bi  for  1  ^  i  5a  M  —  1 ,  and 

s(f,d)  >  s{S,aM_x  -  c)  \id<aM_x-c, 
s(£,d)  >  s(/,  04  +  c)  if  d  >  ax  +  c. 

Similar  expressions  may  be  found  for  r(/,  d). 

In  particular,  suppose  that  h(x)  =  x.  Then,  if  M  is  even,  «(/,  d)  is  minimized 
by  any  d  e  [i(«(m/2)-i  +  %/2)>  i(aM/2  +  a(M/2)  +  i)]>  while  if  M  is  odd,  the 
minimizing  value  is  d  —  u/2  +  a[(M_  d/2]  +  1  )•  This  class  of  minimax 

estimates  for  y  includes  the  Hodges-Lehmann  estimate  median  {04 ,  •  •  •  ,  aM_  x }. 

If  h(x)  =  x2,  the  minimax  estimate  for  y  can  be  described  as  follows.  Let 
mi  =  M~x  If  there  exists  a  k,  1  ^  k  ^  M  —  1,  such  that  mkeBk, 

«(/,  d)  is  minimized  by  d  =  mk.  Otherwise,  there  will  exist  a  A;,  1  ^  k  ^  M  —  1 
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such  that  mk  >  j(ak  +  cik+i)  >  mk+ i>  in  this  event  s(^.d)  is  minimized  by 
d  =  \(ak  +  ak+ ! ).  Viewed  as  functions  of  (a^ ,  •  •  •  ,  xm,  yk ,  •  •  •  ,  yn),  these  mini¬ 
max  decisions  are  minimax  procedures  in  the  sense  of  Definition  2.2. 
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